Getting started

This is an R notebook which uses Stan, a probabilistic programming language, to fit Bayesian models. This notebook is meant to serve as an example of reproducible research, by fully documenting an analytical workflow, and also to demonstrate some best practices in Bayesian model assesment.

The analysis presented below is based on Rand, Arbesman, and Christakis (2011) (available here) where the authors consider the role of social networks in the development of large-scale cooperation among individuals in an economic game. The authors use a logistic regression model to examine individuals’ decisions (cooperation or defection) and show that social networks which can be frequently updated by participants (rather than fixed throughout the course of the game or randomly updated) foster cooperative decisions in this setting.

Although the code and specific models provided here are particular to this dataset, one goal of this notebook is to provide a general outline of a “good” workflow for Bayesian analyses. In different settings or with different data, each section outlined here might be implemented differently (for example, adding an imputation model to deal with missing covariate data or utilizing a more complex model). However, we argue that by documenting the complete workflow – from data collection to answering research hypotheses to model checking – as we have done here, research becomes more reproducible and, additionally, cross-study comparisons can be more easily tackled.

Software

This notebook uses the following R packages:

and was built using R version 3.4.0. Please be sure that all packages (and dependencies) listed above are installed and that the version of R on your machine is up to date.

A reproducible workflow

To increase transparency, we aim to document a (perhaps simplified) workflow in its entirety, from the generation of research hypothesis through data collection and cleaning to modeling techniques all the way through uncertainty evaluation and prediction.

Research goals

Research questions

Generally, Rand, Arbesman, and Christakis (2011) are interested in considering the role of a social network in fostering cooperative actions among a group of individuals. Although cooperation is beneficial for large groups, at the individual level, cooperative actions often leave the individual exposed to exploitation. Some evolutionary game theory models suggest that the social network of individuals can help explain the proliferation of cooperative strategies. See Rand, Arbesman, and Christakis (2011) for more details.

Previous empirical research has only considered fixed, non-dynamic social networks. The authors argue that implementing dynamically-updated networks allows for a new type of conditional action, where individuals may change their decision to cooperate based on changes in the network structure.

Testable hypotheses

The authors hypothesize that implementing frequent user-updated social networks will more strongly promote cooperation, relative to other conditions.

In their paper, the authors discuss other interesting hypotheses, but for demonstration we will focus only on this particular hypothesis.

Study design

Sampling

Subjects were recruited using the online labor market Amazon Mechanical Turk (see the Supporting Information for Rand, Arbesman, and Christakis (2011) for more details).

Experimental design

The authors randomly assigned 785 participants to one of four conditions (see below) in a series of 40 realizations of network experiments (average network size = 19.6; SD = 6.4). In all conditions, subjects play a repeated cooperative dilemma (each game/session consisted of ## rounds) in an artificial social network created in the virtual laboratory:

  • Each player resides on a nonweighted network, with 20% of all possible links connected randomly.
  • His/her neighbors are players connected to him/her in this network.
  • During each round of the game, each player can choose one of the following two actions,
    • Cooperation: donate 50 units per neighbor; each neighbor actually gains 100 units
    • Defection: donate nothing; neighbors get nothing
  • Before each round, players are reminded of their number of neighbors and these neighbors’ prior decisions.
  • After each round, players learn about the decisions of their neighbors and their own payoff.
  • The probability of rewiring (i.e., changing connections in the network) for each subsequent round is 80%, which is communicated to players.

Additionally, the experimenters implemented a variety of different conditions/settings for the behavior of the network in the game:

  • Static (fixed) links
  • Random link updating (i.e., the entire network is regenerated at each round)
  • Strategic link updating: a rewiring step is performed after each round where subject pairs are randomly selected and one randomly selected actor of the selected pair is given the option to change the link status of the selected pair (connected to disconnected, or disconnected to connected.) The deciding actor is provided with alter’s behavior during the previous round. At the end of the rewiring step, each player receives summaries about updates to the network that involve him/her.
    • Viscous: 10% of subject pairs are selected
    • Fluid: 30% of subject pairs are selected

Data cleaning

Import data

This data has been made publicly available (here), which greatly improves reproducibility.

Below, we display 50 randomly selected rows from this dataset:

Handle implausible values

We should check for any missing or implausible values in the imported data set:

##    sessionnum      condition       playerid    decision..0.D.1.C.
##  Min.   : 1.00   Fluid  :1034   Min.   : 102   Min.   :0.0000    
##  1st Qu.: 9.00   Random :1015   1st Qu.: 918   1st Qu.:0.0000    
##  Median :19.00   Static :1008   Median :1915   Median :1.0000    
##  Mean   :21.83   Viscous: 819   Mean   :2196   Mean   :0.5325    
##  3rd Qu.:34.00                  3rd Qu.:3407   3rd Qu.:1.0000    
##  Max.   :52.00                  Max.   :5240   Max.   :1.0000    
##                                                                  
##  previous_decision   round_num      num_neighbors      group_size   
##  Min.   :0.0000    Min.   : 1.000   Min.   : 0.000   Min.   :10.00  
##  1st Qu.:0.0000    1st Qu.: 2.000   1st Qu.: 3.000   1st Qu.:15.00  
##  Median :1.0000    Median : 3.000   Median : 5.000   Median :19.00  
##  Mean   :0.5361    Mean   : 3.863   Mean   : 5.239   Mean   :20.11  
##  3rd Qu.:1.0000    3rd Qu.: 5.000   3rd Qu.: 7.000   3rd Qu.:25.00  
##  Max.   :1.0000    Max.   :11.000   Max.   :18.000   Max.   :33.00  
##  NA's   :785                                                        
##   fluid_dummy    
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.2668  
##  3rd Qu.:1.0000  
##  Max.   :1.0000  
## 

Address incomplete observations

The dataset imported here has already been cleaned for this analysis. The authors chose to drop inactive participants from this dataset (see the Supporting Information for Rand, Arbesman, and Christakis (2011) for more details).

Exploratory data analysis

We present a very brief example of some exploratory data analysis tasks. In practice, this section could be fleshed out much more.

Basic attributes

Here are some of the statistics for this data (generated in-line):

  • There are 785 unique players
  • in 40 sessions/experiments.
  • Each session has up to 11 rounds, with individuals playing 2 to 11 rounds.
  • Each session has one of the 4 conditions: Fixed, Random, Fluid, Viscous.

Each session is assigned a particular network link updating strategy (condition) and may contain a different number of players. Below is the summarized session information:

Note that the number of players vary both within and between experimental conditions:

Overall patterns

In the following sections, we consider how players change their behavior over the course of a game.

Cooperation rate
  • The rate of cooperation tends to decline as the game proceeds
  • The rate of decline differs across experimental conditions.

Individual behavior
  • Players are most likely stay in one mode;
  • but more likely to shift from cooperation to defection than vice versa.

Impact of social connections
  • Players’ tendency to contribute depends on their number of neighbors and whether they can choose their neighbors.

Modeling approaches

In this analysis, we focus on fitting Bayesian versions of the logistic regression models implemented in Rand, Arbesman, and Christakis (2011). In their paper, the main model focuses on the round number and condition used for network link updating (i.e., static, random, viscose, or fluid). This model also accounts for dependence across decisions made by the same player as well as across players in the same session by incorporating random effects at the individual- and session-level

If we let \(y_{ijk}\) be the decision made by player \(j\) in the \(i\)th round of session \(k\) so that \(y_{ijk}=1\) if player \(j\) chooses to cooperate in round \(i\) of session \(k\) and \(y_{ijk}=0\) otherwise. Then our model is simply \[\begin{align*} Y_{ijk} & \sim \text{Bernoulli}(p_{ijk}) \\ \text{logit}(p_{ijk}) &= \beta_1 + \beta_2 x_{1jk} + \beta_3 x_{2k} + \beta_4 x_{1jk} x_{2k} + \tau^{(indiv)}_i + \tau^{(sess)}_{k} \\ \tau^{(indiv)}_i & \sim \text{Normal}(0,\sigma^{(indiv)}) \\ \tau^{(sess)}_k & \sim \text{Normal}(0,\sigma^{(sess)}) \end{align*}\]

where \(p_{ijk}\) is the probability that player \(j\) cooperates in round \(i\) of session \(k\), \(x_{1jk}\) is the round number and \(x_{2k}\) is an indicator for the network behavior setting of the \(k\)th session, with \(x_{2k}=1\) if the \(k\)th session has fluid network link updating and \(x_{2k}=0\) otherwise. \(\sigma^{(indiv)}\) and \(\sigma^{(sess)}\) capture the variation within players and sessions, respectively.

Note that the main conclusion from Rand, Arbesman, and Christakis (2011) states that rapidly updating networks promote cooperation relative to other conditions, which is confirmed using a one-sided hypothesis test for \(\beta_4\). In the paper, the authors provide an estimate, \(\hat{\beta}_4=0.135\), with a small p-value, \(0.006\).

In the following sections, we demonstrate how to fit this model and perform the hypothesis test using Stan.

Variable transformations

No variable transformations are necessary for this analysis. Round number is an integer, and we use a dummy indicator for the fluid condition, e.g.

cooperation$fluid_dummy <- 1 * (cooperation$condition == "Fluid")

To keep track of players and sessions, we create index tables for each, so that they can be easily renumbered:

playerIDs <- data.frame(id = unique(cooperation$playerid), index = 1:length(unique(cooperation$playerid)))
indivIndex <- unlist(lapply(1:nrow(cooperation), function(i) {
    playerIDs$index[playerIDs$id == cooperation$playerid[i]]
}))

sessionIDs <- data.frame(id = unique(cooperation$sessionnum), index = 1:length(unique(cooperation$sessionnum)))
sessIndex <- unlist(lapply(1:nrow(cooperation), function(i) {
    sessionIDs$index[sessionIDs$id == cooperation$sessionnum[i]]
}))

A ‘Stan’ model

First, we can specify our model directly in R, as shown below, or save it using a text editor as separate .stan file. The code provided below was created with readibility and ease of interpretation in mind, for more details about to write efficient model code in Stan, see the documentation (e.g., Section 9.9 in the Language Manual).

logit_model <- '
data {
  
  // Dimensions
  int<lower=1> N;        // Number of observations
  int<lower=1> p;        // Number of parameters
  int<lower=1> N_indiv;  // Number of individuals
  int<lower=1> N_sess;   // Number of sessions

  // Indices
  int<lower=1,upper=N_indiv> indiv[N];
  int<lower=1,upper=N_sess> sess[N];

  // Outcome
  int<lower=0,upper=1> y[N];

  // Covariates
  real  x1[N];
  real  x2[N];
}

parameters {
  real  beta[p];
  real  tau_indiv[N_indiv];
  real  tau_sess[N_sess];
  real<lower=0> prec_indiv;
  real<lower=0> prec_sess;
}

transformed parameters {
  real<lower=0>  sigma_indiv;
  real<lower=0>  sigma_sess;
  sigma_indiv = sqrt(1/prec_indiv);
  sigma_sess = sqrt(1/prec_sess);
}

model {
  for (i in 1:N){
    y[i] ~ bernoulli_logit( beta[1] + beta[2]*x1[i] +
                beta[3]*x2[i] + beta[4]*x1[i]*x2[i] +
                tau_indiv[indiv[i]] + 
                tau_sess[sess[i]] );
  }
  tau_indiv ~ normal( 0, sigma_indiv );
  tau_sess ~ normal( 0, sigma_sess );
 
  beta ~ normal( 0, 100 ); 
  prec_indiv ~ gamma(1, 1);
  prec_sess ~ gamma(1, 1);
}
'

Running the model in R

First, we package the necessary data for use with Stan, in a list format:

dat <- list(N = nrow(cooperation), p = 4, y = cooperation$decision..0.D.1.C., 
    x1 = cooperation$fluid_dummy, x2 = cooperation$round_num, indiv = indivIndex, 
    N_indiv = max(indivIndex), sess = sessIndex, N_sess = max(sessIndex))

Then, we can run the model in R.

resStan <- stan(model_code = logit_model, data = dat, init = list(list(prec_indiv = 1, 
    prec_sess = 1)), chains = 1, iter = 3500, warmup = 500, thin = 1)

Convergence of the MCMC algorithm can be monitored and inspected (note that the gray shaded areas represent the warmup or burnin period).

## Show traceplot
traceplot(resStan, pars = c("beta", "sigma_indiv", "sigma_sess"), inc_warmup = TRUE)

Optional: The rstanarm package

Rather than specify the model code (or a .stan model file) ourselves, we could also make use of the rstanarm package which provides wrapper functions for Stan for a variety of common models, including logistic regression.

cooperation$decision = cooperation$decision..0.D.1.C.
glm.out = stan_glmer(decision ~ fluid_dummy + round_num + fluid_dummy * round_num + 
    (1 | playerid) + (1 | sessionnum), data = cooperation, family = binomial(link = "logit"), 
    chains = 1, iter = 1000)
## trying deprecated constructor; please alert package maintainer

Preliminary results

We can easily extract posterior samples from the model and look at posterior means for the regression coefficients:

##      intercept fluid_dummy round_num fluid_dummy*round_num
## [1,]    1.1738      0.0777   -0.2095                0.0374

Below, we plot our draws from the marginal posterior distributions for the model parameters. Our posterior estimates (sample means) are indicated by the vertical red lines. Because we are often interested in finding non-zero effects, dashed vertical blue lines are plotted at the origin.

Hypothesis Testing

Recall that the authors are interested in testing the hypothesis of whether or not frequent user-updated social networks promote cooperation, relative to other conditions. This can be directly tested by testing for a positive \(\beta_4\) (posterior mean = 0.0374. In our Bayesian setting, this means that we are interested in the posterior probabilty that \(\beta_4\) (given our model and the observed data) is greater than zero. We can estimate this probability using the posterior samples from our model to calculate the proportion of draws from the posterior distribution, \(p(\beta_4|y)\), that are greater than zero:

sum(list_of_draws$beta[, 4] > 0)/length(list_of_draws$beta[, 4])
## [1] 0.768

Thus, our model provides evidence in support of the authors’ claim: frequently user-updated social networks do promote cooperation, relative to the other conditions.

Model checking

Posterior predictive checking

A reasonable way to consider whether the model fits the data (i.e., captures or addresses the important features of the observed data) is to consider whether simulations from the model “look like” the observed data. In this Bayesian setting, we can simply examine draws from the posterior predictive distribution, which is obtained by feeding posterior draws for model parameters back into the model to simulate predicted outcomes. In the following sections, we consider a variety of perspectives to address whether simulations from the model look like the observed data. See Gelman et al. (2014 Chapter Six) for further motivation and more details.

Qualitative Checks

If the model fits well, then replicated data generated under the model should look similar to the observed data. We can check this by obtaining posterior predictions for the observed data. We simply need to add the following few lines to our Stan model code:

"generated quantities {
  vector[N] y_tilde;
  for (i in 1:N)
    y_tilde[i] = bernoulli_logit_rng( beta[1] + beta[2]*x1[i] +
                beta[3]*x2[i] + beta[4]*x1[i]*x2[i] +
                tau_indiv[indiv[i]] + 
                tau_sess[sess[i]] );
}"

Then, we can fit this model just as before:

postPredStan <- stan(model_code = logit_postPred, data = dat, init = list(list(prec_indiv = 1, 
    prec_sess = 1)), chains = 1, iter = 3500, warmup = 500, thin = 1)

and easily extract the results:

which, by default, excludes the warm-up or burn-in samples and permutes the ordering of the remaining posterior samples.

Direct Data Display

Because we are concerned with individual-level behavior, to help visualize the data we first organize it into a matrix where each row corresponds to a unique player and each column represents a round of play (across all conditions, there are up to 11 rounds). The plot below displays this matrix as an image, so that we can examine the decision history, cooperate (colored bar) or defect (grey bar) for all r max(unique(cooperation$playerid)) players in each round.

Now we can compare this plot for the original (true) data to realizations from the posterior distribution from our model:

## using data from the first layer
## Removing the following cognostics that are all NA: value_mean

In this case, there doesn’t appear to be any obvious differences between the true data and our posterior predictions.

Summary Statistics or Inferences

In this setting, the large number of individual participants can make it difficult to spot differences in the data. Instead, we can use summary measures or trends to compare our posterior predictions to the original data.

Below, we average individual behavior by round (recall from the plots above that later rounds typically have far fewer participants) and plot the the average cooperation level by round:

Recall that we are not modeling each session type individually, but have only included an indicator for fluid vs. non-fluid sessions in our model. Observed variation across predictions for non-fluid sessions is a result of the estimated session-level random effects.

Since our focus is on the performance of the fluid network setting relative to all other settings, we can consider collapsing all non-fluid rounds together in order to make the comparison clearer:

We can also consider players’ tendency to switch between cooperating or defecting in the posterior samples (recall that we looked at these proportions in our exploratory analysis), by averaging over all rounds and players.

Again, we the truth (top left plot) is hard to distinguish among posterior samples.

Another perspective could consider game sessions individually.

Formal test quantities

Had we seen any inconsistencies between our posterior predictions and the observed data, we could consider developing formal test quantities to further examine these differences. For example, say that we had noticed a difference in the proportion of times individuals move from cooperation to defection across rounds (i.e. differences in the height of the C –> D bars across the above barplot. We could use the observed proportion in each draw or sample from the posterior predictive distribution to test whether or not there is any systematic difference between our posterior predictions and the observed data, in this respect. Of course, other tet quantities could certainly be developed. For more discussion, see Gelman et al. (2014 Chapter Six).

Uncertainty Evaluation and Prediction

In this section, we use bootstrapped data sets (resampling, with replacement, rows (individual-round outcomes) from the original data set) to evaluate model fit.

Treating all observations independently

Here, we simply treat each row as an individual data point, an outcome (cooperate or defect) for a particular individual in a particular round.

if (runModels) {
    logit.bootstrap <- function(data, indices) {
        d <- data[indices, ]
        fit.glm <- glm(decision..0.D.1.C. ~ fluid_dummy * round_num, data = d, 
            family = "binomial")
        return(coef(fit.glm))
    }
    
    logit.boot <- boot(data = cooperation, statistic = logit.bootstrap, R = 30)  # R: number of samples
    
    
    fileName <- "./lib/logit.stan"
    stan_code <- readChar(fileName, file.info(fileName)$size)
    cat(stan_code)
    
    logit.stan.bootstrap <- function(data, indices) {
        d <- data[indices, ]
        dat <- list(N = nrow(d), p = 4, decision = d$decision..0.D.1.C., fluid_dummy = d$fluid_dummy, 
            round_num = d$round_num)
        fit.stan.logit <- stan(model_code = stan_code, data = dat, chains = 3, 
            iter = 3000, warmup = 500, thin = 10)
        list_of_draws.bootstrap <- extract(fit.stan.logit)
        return(list_of_draws.bootstrap$beta[dim(list_of_draws.bootstrap$beta)[2], 
            ])
    }
    
    logit.stan.boot <- boot(data = cooperation, statistic = logit.stan.bootstrap, 
        R = 3)  # R: number of samples
    
}

Accounting for sessions

However, sampling each row of the data set completing independently ignores the fact the players are grouped into sessions, with players able to interact only with other players in his/her assigned session.

In this section, we resample sessions (each session consists of a group of rows, i.e. a collection of individual-round outcomes).

First, specify the number of bootstrapped sets:

num.boot.sets = 2

Then, we can construct the bootstrapped data set:

if (runModels) {
    samp = matrix(nrow = num.boot.sets, ncol = length(unique(cooperation$sessionnum)))
    logit.bootstrap.session = matrix(nrow = num.boot.sets, ncol = 4)
    bootstrap.session.beta = array(rep(0, num.boot.sets * 4 * 750), c(750, 4, 
        num.boot.sets))
    
    for (i in 1:num.boot.sets) {
        samp[i, ] <- sample(unique(cooperation$sessionnum), 40, replace = TRUE)
        tmp.assign <- do.call(rbind, lapply(samp[i, ], function(x) cooperation[cooperation$sessionnum == 
            x, ]))
        logit.bootstrap.session.tmp <- glm(decision..0.D.1.C. ~ fluid_dummy * 
            round_num, data = tmp.assign, family = "binomial")
        logit.bootstrap.session[i, ] = logit.bootstrap.session.tmp$coefficients
        
        dat.bootstrap.session.tmp <- list(N = nrow(tmp.assign), p = 4, decision = tmp.assign$decision..0.D.1.C., 
            fluid_dummy = tmp.assign$fluid_dummy, round_num = tmp.assign$round_num)
        fileName <- "./lib/logit.stan"
        stan_code <- readChar(fileName, file.info(fileName)$size)
        cat(stan_code)
        resStan.bootstrap.session.tmp <- stan(model_code = stan_code, data = dat.bootstrap.session.tmp, 
            chains = 3, iter = 3000, warmup = 500, thin = 10)
        
        list_of_draws.bootstrap.session.tmp <- extract(resStan.bootstrap.session.tmp)
        bootstrap.session.beta[, , i] <- list_of_draws.bootstrap.session.tmp$beta
    }
}

Summary

This table shows the summary of results for bootstrapping.

Posterior distribution draws are as follow:

Comparing KL-divergence

## , , 1
## 
##            orig_bayes       boot1       boot2
## orig_bayes 0.00000000 0.021056239 0.022523957
## boot1      0.02186232 0.000000000 0.007666615
## boot2      0.02325961 0.007626398 0.000000000
## 
## , , 2
## 
##            orig_bayes      boot1      boot2
## orig_bayes 0.00000000 0.09715858 0.09403902
## boot1      0.09552282 0.00000000 0.01746229
## boot2      0.09261596 0.01750814 0.00000000
## 
## , , 3
## 
##              orig_bayes        boot1        boot2
## orig_bayes 0.0000000000 0.0003780843 0.0004567395
## boot1      0.0003777219 0.0000000000 0.0002521667
## boot2      0.0004556588 0.0002524070 0.0000000000
## 
## , , 4
## 
##             orig_bayes      boot1       boot2
## orig_bayes 0.000000000 0.01300678 0.001687616
## boot1      0.012650247 0.00000000 0.011715751
## boot2      0.001688216 0.01204889 0.000000000

Why bother?

Reproducible research

The degree of reproducibility of scientists’ results and findings has increasingly become an important area of interest. In part this is due to the “replication crisis” (for more discussion, see some of the posts on Andrew Gelman’s blog - here and here) and, generally speaking, the idea of reproducible research is hard to argue against. However, what exactly fully reproducible research looks like in practice is still up for debate. In the analysis presented here, we have tried to take steps in this direction by

  • making code, data, plots, and results available
  • documenting the complete workflow (from research hypotheses, to data collection and cleaning, all the way through to modeling and the consideration of model fit)
  • integrating code within the analysis narrative (in an accessible and interactive way, when possible)
  • increasing transparency (by briefly describing the motivation for each step in our particular data analysis path)
  • describing software requirements.

Stodden et al. (2016) (available here) provide a good discussion of some characteristics of an (ideal) reproducible analysis, including some of those we have tried to incorporate here.

Bayesian modeling with Stan

Like other probabilistic programming languages, Stan provides built-in computational tools to do inference (e.g. parameter estimation, hypothesis testing) for Bayesian models, which in most cases would otherwise require much more work (i.e. if we had to code the samplers and algorithms by hand). In addition to R (used here), Stan is capable of interfacing with a variety of data analysis languages, such as Python, shell, MATLAB, Julia, and Stata, which helps to make Stan even more accesible. Additionally, Stan models utilize highly efficient sampling algorithms and, in R, can be further investigated with very useful helper packages, such as shinyStan which provides easy access to visual and numerical summaries of model features and diagnostics and rstanarm which provides wrapper functions for many common pre-built Bayesian models.

For more discussion of Bayesian methods for data analysis and best practices in the field, we recommend Gelman et al. (2014).

Some references

Gelman, Andrew, John B Carlin, Hal S Stern, and David B Dunson. 2014. Bayesian Data Analysis. Vol. 2.

Rand, David G, Samuel Arbesman, and Nicholas A Christakis. 2011. “Dynamic Social Networks Promote Cooperation in Experiments with Humans.” Proceedings of the National Academy of Sciences 108 (48). National Acad Sciences: 19193–8.

Stodden, Victoria, Marcia McNutt, David H Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A Heroux, John PA Ioannidis, and Michela Taufer. 2016. “Enhancing Reproducibility for Computational Methods.” Science 354 (6317). American Association for the Advancement of Science: 1240–1.